-
-
Notifications
You must be signed in to change notification settings - Fork 836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(utils): linear scan is_uri
#2648
Conversation
Hi, just bumped into this while browsing. I'm not at all familiar with the project, so can't say if the performance improvement would outweigh the complexity cost, and what would be the effect in real workloads. However, if I'm not mistaken this change introduces two differences in the matching:
Because the function is checking for uri and not windows path, I think 2. is more correct now. Uri should not contain But 1. seems like a bug. I think the following (pseudocode) should fix 1. while keeping 2.: if filename[1] not in [a-zA-Z] then
return false
end
for i = 2, #filename do
if filename[i] == ':' then
return filename[i+1] ~= '\\'
elseif filename[i] not in [a-zA-Z0-9.+-] then
return false
end
end
return false |
@juntuu that pseudocode won't work afaict since there is no chance the |
Oops, good catch. Thanks! I flipped the conditions now, so the |
2d6b82a
to
cc9193e
Compare
Good catches to both! No significant performance difference between my original implementation and the suggested change.
|
cc9193e
to
8ea8b04
Compare
8ea8b04
to
a2d7603
Compare
I'm going to go ahead and merge this in. I think considering this is a pretty hot function, the performance gain, particularly on Windows is worth the diminished readability. |
Benchmarked against 80k+ unique paths, both Windows and Linux (macos should be close enough to llinux).
Approximately 13% faster for linux paths and 55% faster for windows paths.